Anatomy of Rmd

This is an R Markdown document rendered as an HTML page. R Markdown documents are files that end in .Rmd and include a few different parts:

The YAML header

The YAML header is the section at the very top of our file between the --- symbols. This text doesn’t show up directly in our output, but it tells R Markdown how to render our document.

We can specify multiple output formats in our YAML header, like html_document, pdf_document, and word_document. We can also specify settings for these output formats, to do things like add a floating table of contents. And we can add things like a title, author, and date which may be automatically rendered depending on our output format.

Code chunks

The most important part of our R Markdown document - our code! - lives in “chunks” throughout our document. These are the sections that start and end with three tick marks (```) and have a dark gray background in RStudio.

The first line of a code chunk also includes information in curly braces about the code and how its output should be displayed:

  • The first and always required item is the language the code is in, e.g. {r} or {python}
  • After the language comes an optional chunk label, e.g. {r setup}
  • Finally and separated by a comma come any chunk options. These control the output of our code. A few useful ones include:
    • eval: whether to run the code
    • echo: whether to print the code (or just the output)
    • warning, message: whether to print any warnings or messages generated by our code
    • Create math equations with LaTeX syntax: \(f(k) = {n \choose k} p^{k} (1-p)^{n-k}\)
    • many more

Markdown

Markdown is how we do all of our other formatting. It was written in 2004 by Jon Gruber and Aaron Swartz1 as a way to turn plain text into rich formatted HTML. Markdown syntax is easy to learn, and elements of Markdown are supported in many places (Reddit, Slack, discussion boards).

As you can see throughout this document, you can do a lot of things with Markdown, including:

  • Italicize text by surrounding it with single underscores (_text_) or asterisks (*text*)
  • Bold text by surrounding it with double asterisks (**text**)
  • Insert links with [some text](https://www.rstudio.com/)
  • Render text as inline code with single tick marks

* Render text as code blocks with triple tick marks

  • Create section headings with # (and subheadings with ##, ###, ####, …)
  • Create ordered lists with 1., 2., 3., … and unordered lists with *, -, or +
  • Create footnotes with ^[]2

We can also do really neat things like create tabs in our HTML doc:

House Price Index

hpi <- read_excel("../HPI_PO_monthly_hist.xls", skip = 3)

hpi_wrangled <- hpi %>% 
  clean_names() %>% 
  slice(-1) %>%   # remove empty row
  rename(date = month) %>%
  select(date, ends_with("_sa")) %>%  # only keep seasonally adj. data
  # separate() from tidyr package to split date into separate columns for day/month/year 
  separate(date, into = c("year", "month", "day"), sep = '-', convert = TRUE, remove = FALSE) %>% 
  # unite() from tidyr to join columns 
  unite(yr_mon, year, month, sep = "/", remove = FALSE) %>% 
  mutate_if(is.numeric, round, digits = 2) %>%  # round all numeric columns to 2 digits
  # add labels with case_when()
  mutate(season = case_when(between(month, 3, 5) ~ "spring",
                            between(month, 6, 8) ~ "summer",
                            between(month, 9, 11) ~ "fall",
                            # between(month, 12, 2) ~ "winter" == won't work because there's no numbers between 12 and 2
                            TRUE ~ "winter")) %>% 
  select(date, yr_mon, year, month, day, season, everything()) %>%  # reorder columns
  # arrange() from dplyr to sort rows
  arrange(date)

hpi_tidy <- 
  hpi_wrangled %>% 
  select(date, contains("north"), contains("south")) %>% 
  # pivot_longer makes data long, or tidy
  pivot_longer(-date, names_to = "division", values_to = "hpi") %>% 
  group_by(division)

HPI over time

hpi_plot <- ggplot(hpi_tidy, aes(x = date, y = hpi, color = division)) +
  geom_line()

hpi_plot %>% 
  ggplotly()

Distributions of monthly changes

hpi_pct <- 
  hpi_tidy %>% 
  mutate(pct_change = (hpi / lag(hpi)) - 1,
         pct_change_12_mons = (hpi / lag(hpi, 12)) - 1) %>%
  na.omit()

hpi_pct %>% 
  ggplot(aes(x = pct_change)) +
  geom_histogram(fill = "darkblue", color = "darkred", bins = 50) +
  facet_wrap(~division) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  theme_minimal()

Analysis details

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.